An Optimized Floating-Point Matrix Multiplication on FPGA
نویسندگان
چکیده
منابع مشابه
FPGA accelerator for floating-point matrix multiplication
This study treats architecture and implementation of a FPGA accelerator for double-precision floating-point matrix multiplication. The architecture is oriented towards minimising resource utilisation and maximising clock frequency. It employs the block matrix multiplication algorithm which returns the result blocks to the host processor as soon as they are computed. This avoids output buffering...
متن کاملEnergy Performance of Floating-Point Matrix Multiplication on FPGAs
Floating-point matrix multiplication is a basic kernel in scientific computing. It has been shown that implementations of this kernel on FPGAs can achieve high sustained performance [1]. However, to the best of our knowledge, existing work on FPGA-based floating-point matrix multiplication considers the optimization of latency or area only. In this paper, we analyze the impact of various parame...
متن کاملAn Optimized Matrix Multiplication on ARMv7 Architecture
A sufficiently optimized matrix multiplication on embedded systems can facilitate data processing in high performance mobile measuring equipment since plenty of the kernel mathematical algorithms are based on matrix multiplication. In this paper, we propose a matrix multiplication specially optimized for ARMv7 architecture. The performance-critical differences between ARMv7 and conventional des...
متن کاملDesign and Implementation of an Optimized Double Precision Floating Point Divider on FPGA
Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications.So due to this not much development had taken place in this field. But nowadays floating point divider has become indispensable and increasingly important in many modern applications. Most of the previous implementation required much larger area and latencies. In this ...
متن کاملAn Efficient LUT Design on FPGA for Memory-Based Multiplication
An efficient Lookup Table (LUT) design for memory-based multiplier is proposed. This multiplier can be preferred in DSP computation where one of the inputs, which is filter coefficient to the multiplier, is fixed. In this design, all possible product terms of input multiplicand with the fixed coefficient are stored directly in memory. In contrast to an earlier proposition Odd Multiple Storage ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Technology Journal
سال: 2013
ISSN: 1812-5638
DOI: 10.3923/itj.2013.1832.1838